Fix min and max mz filtering in cut_mz_domain_noise#13
Fix min and max mz filtering in cut_mz_domain_noise#13Kzra wants to merge 2 commits intoEMSL-Computing:masterfrom
Conversation
|
mz should be ordered from low to high mz by default now, so this may not be required anymore. |
There was a problem hiding this comment.
Pull request overview
Fixes how the noise ROI (min/max m/z) is selected in NoiseThresholdCalc.cut_mz_domain_noise() so min_noise_mz / max_noise_mz actually affect baseline noise calculations (notably for threshold_method='signal_noise').
Changes:
- Adjusted how
low_mz_index/high_mz_indexare chosen frommz_exp_profilefor ROI slicing. - Fixed the slice order/consistency in the
if high_mz_index > low_mz_indexreturn path.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| #the following indexing relies on mz_exp_profile being ordered high mz to low mz | ||
| low_mz_index = (where(self.mz_exp_profile >= min_mz_noise)[-1][-1]) | ||
| #print(self.mz_exp_profile[low_mz_index]) | ||
| #low_mz_index = (argmax(self.mz_exp_profile <= min_mz_noise)) | ||
|
|
||
| high_mz_index = (where(self.mz_exp_profile <= max_mz_noise)[0][0]) | ||
| #print(self.mz_exp_profile[high_mz_index]) |
There was a problem hiding this comment.
The new where(...)[-1][-1] / [0][0] selection assumes mz_exp_profile is sorted high→low. If mz_exp_profile is low→high (which the if high_mz_index > low_mz_index branch suggests is possible, and can happen for array-based inputs), low_mz_index becomes the last element and the slice includes values outside [min_mz_noise, max_mz_noise], so min/max filtering breaks again. Consider making the index selection order-independent, e.g., build a boolean mask for (mz>=min)&(mz<=max) and slice using min(idx):max(idx)+1, which also lets you drop the order-dependent comment and the if/else.
| if high_mz_index > low_mz_index: | ||
| # pyplot.plot(self.mz_exp_profile[low_mz_index:high_mz_index], self.abundance_profile[low_mz_index:high_mz_index]) | ||
| # pyplot.show() | ||
| return self.mz_exp_profile[high_mz_index:low_mz_index], self.abundance_profile[low_mz_index:high_mz_index] | ||
| return self.mz_exp_profile[low_mz_index:high_mz_index], self.abundance_profile[low_mz_index:high_mz_index] |
There was a problem hiding this comment.
This change affects how noise ROI is selected for all non-'minima' threshold methods, but there doesn’t appear to be a unit test that asserts min_noise_mz/max_noise_mz actually change the computed noise baseline/STD (especially for both ascending and descending mz_exp_profile). Adding a focused test around cut_mz_domain_noise() (or run_noise_threshold_calc() with threshold_method='signal_noise') would prevent regressions like the original bug and the ordering edge case.
I noticed that changing the min_mz_noise and max_mz_noise parameters did not effect the baseline noise calculation when the threshold method was 'signal_noise'.
This seems to be fixed by changing the indexing in the cut_mz_domain_noise method in the NoiseThresholdCalc class.
Because mz_exp_profile was ordered from high to low in my tests, the indexing in the following commands needed to be reversed.
So I changed
to:
I saw that there was an if statement at the end of the function which compares the low_mz_index and high_mz_index to determine the order of the slice. This implies that the order of mz_exp_profile is not always fixed, in which case a more flexible solution is required. Happy to implement this if this is the case, otherwise I think the current fix will be fine.
I also changed the if statement at the end of the function which had the low_mz_index and high_mz_index variables in the wrong order.
EDIT 07/03/23: Changed the first index to -1 for low_mz_index in case the mz_exp_profile is a 2D array